Merged
Conversation
22bddfd to
7b312c3
Compare
There was a problem hiding this comment.
Pull request overview
This PR addresses silent aborts when running BERT embedding workloads in the browser by aligning JS-selected thread counts with the actual compiled Emscripten pthread pool size, and by making the build/runtime configuration more tolerant of different build modes and toolchains.
Changes:
- Expose the compiled pthread pool size from the WASM core and use it in the JS runtime to cap selected thread counts.
- Make
PTHREAD_POOL_SIZE_STRICTconfigurable (defaulting to0) to avoid hard aborts on unexpected over-pool requests. - Make the wasm64 BigInt post-link patch more robust to non-minified/debug-style formatting, and force-enable CMake’s pthread probe hint for newer Emscripten toolchains.
Reviewed changes
Copilot reviewed 6 out of 6 changed files in this pull request and generated 1 comment.
Show a summary per file
| File | Description |
|---|---|
| src/llama_webgpu_core.cpp | Adds an exported function to report the compiled pthread pool size. |
| js/llama_webgpu_bridge.js | Reads pool size from core via ccall and uses it to cap runtime thread selection. |
| CMakeLists.txt | Adds a strictness knob, exports the new core symbol, defines pool size for compilation, and adds an Emscripten pthread probe cache hint. |
| scripts/build_bridge.sh | Adds env var plumbing for strictness and improves wasm64 BigInt patching via regex. |
| README.md | Documents the new strictness env var and updated pthread pooling behavior. |
| AGENTS.md | Adds local verification guidance and a regression smoke checklist for pthread/BERT cases. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
7b312c3 to
8b63078
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
0as a fallback against hard aborts from unexpected over-pool requestsFixes #5.
Verification
CCACHE_DIR=/private/tmp/llama_web_bridge_issue5_ccache EM_CACHE=/private/tmp/llama_web_bridge_issue5_emcache LLAMA_CPP_DIR=/opt/UnitySrc/personal/llama/llamadart-native/third_party/llama.cpp BUILD_DIR=/private/tmp/llama_web_bridge_issue5_build5 MEM64_BUILD_DIR=/private/tmp/llama_web_bridge_issue5_build5_mem64 OUT_DIR=/private/tmp/llama_web_bridge_issue5_dist6 WEBGPU_BRIDGE_BUILD_MEM64=1 ./scripts/build_bridge.shbash -n scripts/build_bridge.shnode --check js/llama_webgpu_bridge.jsjina-embeddings-v2-small-en-Q2_K.gguf: direct runtime load/tokenize/embed/embedBatch passed with auto threads capped to 4